Inferring and Executing Programs for Visual Reasoning Supplementary Material

نویسندگان

  • Justin Johnson
  • Bharath Hariharan
  • Laurens van der Maaten
  • Judy Hoffman
  • Li Fei-Fei
  • C. Lawrence Zitnick
  • Ross Girshick
چکیده

In all experiments our program generator is an LSTM sequence-to-sequence model [9]. It comprises two learned recurrent neural networks: the encoder receives the naturallanguage question as a sequence of words, and summarizes the question as a fixed-length vector; the decoder receives this fixed-length vector as input and produces the predicted program as a sequence of functions. The encoder and decoder do not share weights. The encoder converts the discrete words of the input question to vectors of dimension 300 using a learned word embedding layer; the resulting sequence of vectors is then processed with a two-layer LSTM using 256 hidden units per layer. The hidden state of the second LSTM layer at the final timestep is used as the input to the decoder network. At each timestep the decoder network receives both the function from the previous timestep (or a special token at the first timestep) and the output from the encoder network. The function is converted to a 300-dimensional vector with a learned embedding layer and concatenated with the decoder output; the resulting sequence of vectors is processed by a two-layer LSTM with 256 hidden units per layer. At each timestep the hidden state of the second LSTM layer is used to compute a distribution over all possible functions using a linear projection. During supervised training of the program generator, we use Adam [7] with a learning rate of 5 × 10−4 and a batch size of 64; we train for a maximum of 32,000 iterations, employing early stopping based on validation set accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Benchmark Visual Question Answer Models by using Focus Map

Inferring and Executing Programs for Visual Reasoning proposes a model for visual reasoning that consists of a program generator and an execution engine to avoid endto-end models. To show that the model actually learn which objects to focus on to answer the questions, the authors give a visualizations of the norm of the gradient of the sum of the predicted answer scores with respect to the fina...

متن کامل

وضعیت تغذیه کمکی در کودکان زیر یک سال مراجعه‌کننده به درمانگاه‌های ایلام

Background & Aim: Exclusive breast feeding is highly recommended for children under six months and the best time for starting supplementary food is the end of sixth months. Inadequate feeding can lead to malnutrition. Since infants;apos supplementary feeding pattern is influenced highly by the socio-cultural status it is necessary to study the subject in diverse conditions. This study aimed to ...

متن کامل

A Proposal for Weak-Memory Local Reasoning

Program logics are formal systems for specifying and reasoning about software programs. Most program logics make the strong assumption that all threads agree on the value of shared memory at all times. This assumption can be unsound though for programs with races, like many concurrent data structures. Verification of these difficult programs must take into account the weaker models of memory pr...

متن کامل

Comparison of Moral Reasoning among Students with and without Visual Impairment

Background and Purpose: Some research has examined the moral reasoning and judgment in students with special needs and has shown that these students are lagging behind their non-disabled counterparts in term of moral development. Very few studies have been done in the area of development of moral reasoning in individuals with visual impairment; so given the research vacuum in this context, the ...

متن کامل

Formalization and Reasoning in a Reflective Architecture

This paper is concerned with developing a reflective architecture for formalizing and reasoning about entities that occur in the process of software development, such as specifications, theorems, programs, and proofs. The starting point is a syntactic extension of the type theory ECC . An encoding of this object calculus within itself comprises the meta-level, and reflection principles are prov...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017